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A new method, based on Bayesian analysis, is presented which unifies the inference 
of plasma equilibria parameters in a Tokamak with the ability to quantify differences 
between inferred equilibria and Grad-Shafranov force-balance solutions. At the heart 
of this technique is the new method of observation splitting, which allows multiple 
forward models to be associated with a single diagnostic observation. This new idea 
subsequently provides a means by which the the space of GS solutions can be ef- 
,— 1 ficiently characterised via a prior distribution. Moreover, by folding force-balance 

directly into one set of forward models and utilising simple Biot-Savart responses in 

\o 

another, the Bayesian inference of the plasma parameters itself produces an evidence 

CO 

(a normalisation constant of the inferred posterior distribution) which is sensitive to 
^ the relative consistency between both sets of models. This evidence can then be used 

to help determine the relative accuracy of the tested force-balance model across sev- 
eral discharges/times. These ideas have been implemented in a code called BEAST 
(Bayesian Equilibrium Analysis and Simulation Tool), which uses a special imple- 
mentation of Skilling's nested sampling algorithm [Skilling, Bayesian Analysis 1(4), 
833-859 (2006)] to perform sampling and evidence calculations on high-dimensional, 
non-Gaussian posteriors. Initial BEAST equilibrium inference results are presented 
for two high-performance MAST discharges. 
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I. INTRODUCTION 



Reconstruction of the equilibrium magnetic field geometry is of central importance in 
both the control and analysis of tokamak plasmas. Indeed, precise, open-loop control and 
reliable inference from many disparate diagnostics systems will be an absolute necessity 
for scientists seeking to tune and control future tokamak experiments to sustain so-called 
burning plasmas; at the core of both these needs lies equilibrium reconstruction. 

Many advances have been made in the area of equilibrium reconstruction since the 1970s, 
most of which have focused on making solving the Grad-Shafranov (GS) equation a fast 
computation.'^ While these efforts have resulted in equilibrium reconstruction becoming a 
cornerstone of routine shot-analysis on all of today's major tokamak experiments, current 
equilibrium reconstruction codes still must intrinsically assume the structure of the GS 
equation to make their inference. Thus, effects of flow, pressure anisotropy, and other points 
of kinetic physics are inescapably ignored in these reconstructions. 

Modern tokamak equilibrium reconstruction works by finding the GS solution which best 
fits a set of given magnetic diagnostic data. This methodology has the advantage that 
such GS fits can be quickly computed but obviously removes the possibility of exploring 
how an inferred equilibrium may differ from a GS solution to better fit the actual data. 
Thus, an ideal compliment to these existing GS solvers is a tool, implemented as a piece 
of software, by which the strength of the a priori GS constraint could be adjusted to gain 
an intuition and quantification of how much an inferred structure (i.e. fit to diagnostics 
with minimal constraints) can deviate from a best-fit GS solution. Bayesian analysis is 
an ideal methodology upon which such a tool can be constructed, as it provides a readily 
accessible means to place non-intrinsic (i.e. adjustable) a priori constraints on inference 
parameters. The advantage of this Bayesian tool over present methods would be two-fold: it 
could quantify how that inferred structure differed from a GS solution; and it could produce 
expectations, uncertainties and higher-order statistical moments on inferred parameters in a 
mathematically rigorous way. This second point encompasses the tool's ability to replicate 
the functionality of modern equilibrium reconstruction codes. 

The reason why such a code has not been previously developed is due to the fact that the 
associated inference problem would be non-linear in nature over a model (inference) parame- 
ter domain of high dimension. Such high-dimensional problems pose intrinsic computational 
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difficulties and can take significant time and computational resources to perform (see Ch. 
29-30 in MacKay^ 1 for a nice discussion on these points). Indeed, traditional GS solvers 
only consider diagnostic fits that are GS solutions, which are subsequently characterised 
by a handful of parameters (usually about five in total) reflecting ID parameterisations of 
the poloidal current and kinetic pressure profiles. Without an intrinsic GS constraint, this 
new code requires a collection of model parameters that characterise the 2D toroidal current 
profile, in addition to the ID poloidal current and kinetic pressure profiles, which signifi- 
cantly increases the number of degrees of freedom (i.e. dimensionality of model parameters) 
in the inference. This paper introduces a software tool based on Bayesian inference that 
uses new methods which help to overcome the computational obstacles associated with a 
high-dimensional, non-linear equilibrium reconstruction, while preserving the advantages as- 
sociated with being able to control the strength of the GS a priori constraint. The result 
is a code which has a unique set of features, which wholly complement those of today's 
state-of-the-art GS solvers and is called the Bayesian Equilibrium Analysis and Simulation 
Tool or BEAST. 

The paper is structured as follows: in £jTT] a general overview of Bayes' formula is given in 



the context of diagnostic data analysis. Next, ^ III gives an explanation of the plasma and 



diagnostic models used in BEAST. Section |IV| details the concept of observation splitting: 
an idea that enables multiple forward models to be associated with a single diagnostic obser- 
vation. This is followed, in §|V] by a discussion on the design of the a priori constraint placed 
toroidal plasma current model. Computational obstacles and the general methodology used 



to overcome them are discussed in SVI, with relevant technical details and benchmarks 



presented in Appendix |AJ Results are presented in §VII| for two high-performance MAST 
discharges. Finally, §VIII contains concluding remarks and discusses future research direc- 
tions. 



II. OVERVIEW OF BAYESIAN DIAGNOSTIC ANALYSIS 

Bayesian diagnostic inference is the mathematical foundation of BEAST. Here, we will 
only give a brief summary of how to apply Bayesian ideas to diagnostic data and encourage 
the interested reader to look elsewhere®^ for more details. In Bayesian diagnostic inference 
(assuming independent diagnostic observations) the goal is to statistically infer a vector of 
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model parameters, denoted A, given a vector of diagnostic data and associated uncertainties, 
/Z and a respectively. A cornerstone of Bayesian inference is the idea that it is impossible 
to perform inference without making some background assumptions^, and these will be 
denoted as A. The specific nature of the assumptions used in this research will be made 
clear throughout the paper. Given this notation, Bayes' formula can be written as 

V(X\fi, a, A) = — _ , (1) 

V{/j,a,A) 

where the notation '|' is read as given and the ',' read as and; e.g. V(a, b\c, d, e) would read 
as the probability of a and b given c, d and e. In this formula, each factor has a common 
name in Bayesian theory: V(X\Jl, cr, A) is called the posterior and is the probability distribu- 
tion of model parameters given a set of diagnostic observations and background assumptions; 
V(fii\X, o~i. A) is the likelihood for a particular diagnostic observation and represents the prob- 
ability that a given configuration of model parameters and diagnostic uncertainties generated 
the associated observation; V(X) is called the prior and is a probability distribution which 
contains a priori information about the model parameters themselves; and V(~p, a, A) is the 
evidence: a normalisation constant ensuring the left-hand side of Bayes' formula integrates 
to unity. Intuitively, one can thing of the posterior as representing an informed state of 
knowledge, when a prior understanding is updated with an observation, which is embodied 
in the likelihood. By using a posterior of on observation as a prior for another observation, 
an iterative process is created by which an initially given prior is updated with any number 
of observations. Finally, the evidence can loosely be thought of as the relative conviction 
one has about the inference of model parameters: a larger evidence corresponds to a better 
pairwise agreement between the prior and all likelihood distributions, given a fixed number 
of diagnostics and fixed uncertainties. This final point will be detailed further later in the 
section. 

If considering only one observation corresponding to /ij and <7j, one can write Bayes' 
formula as 

V(X\/i h a t ) ex V(jn\X, ai)P(X). (2) 

Note that the A corresponding to the background assumptions has been dropped to simplify 
the notation; this convention will be carried out through the remainder of the paper. As fx 
and a are given and thus assumed to be constant, V(Jl, a) is also constant, justifying the pro- 
portionality in Eq. (J2J). The forward model, .F(A), is implicitly contained within V(f/,i\X, o~i) 
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and relates an arbitrary configuration of model parameters to a given set of observations. In 
particular, .F(A) is a deterministic mapping from the vector space of model parameters to 
the vector space of associated diagnostic predictions. These predictions are meant to reflect 
the actual diagnostic reading if the physical system is in a state corresponding to the vector 
input of the forward model. Thus, providing the forward model represent the true physics 
of the diagnostic, values of ^(A) will be the diagnostic measurement. 
Likelihoods are taken to be mappings of the form 



where M is represents a Gaussian distribution over pair-wise independent variables. The 
first argument of the Gaussian distribution represents the mean vector, and the second argu- 
ment reflects the entries in a diagonal covariance matrix. This likelihood form is ubiquitous 
in diagnostic analysis, as diagnostic observations are often associated with Gaussian dis- 
tributions whose standard deviation corresponds to the given error of the diagnostic. The 
structure of the likelihood is further justified when noting that Gaussian distributions serve 
to minimise the Shannon information, when only diagnostic observations and uncertainties 
are knownP^ When viewing the likelihood as a mapping from data-vectors to probability 
distributions over model parameters, Eq. (|3j> indicates that all such output distributions are 
invariant, modulo translation determined by the given diagnostic observations. For such 
likelihoods, one may think of the evidence as a measure of the pairwise consistency between 
all observations and the prior knowledge. This can be understood by thinking of two arbi- 
trary, freely-translating, Gaussian probability distributions with fixed variances multiplied 
together. The integral of the result will decrease as the expectation of these distributions 
move away from each other and is maximised when the expectations are identical; i.e. the 
overlap between the two distributions directly reflects the consistency between both associ- 
ated observations. This statement is quantified by relative size of the evidence. 

Using the particular likelihood in Eq. ([3]), one has the following expression for the poste- 
rior: 



and thus, the posterior is a distribution over the space of model parameter configurations. 
Equation ([5]) is the general representation by which sampling statistics can be used to 
construct moments of the posterior distribution, which directly correspond to expectation 




(3) 
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values and associated errors for the model parameters. Finally, the evidence is calculated 
by integrating the right-hand side of Eq. (Q over the entire domain of model parameter 
configurations. 

III. PLASMA AND DIAGNOSTIC FORWARD MODELS 

MAST is a well-diagnosed machine with over 100 equilibrium magnetic diagnostics and 
a Motional-Stark Effect (MSE) polarimetry system. This paper will focus on inferring the 
2D toroidal current, along with the poloidal current and pressure profiles using equilibrium 
magnetics, MSE and Rogowski coil data measuring the total plasma current and toroidal 
field coil currents. 

A. Toroidal Plasma Current Mode 

The toroidal plasma current is modelled by a collection of axisymmetric current beams 
of rectangular cross section, which fill out the plasma volume. This model was proposed in 
Svensson & Werner^ and has been used successfully to infer the spatially-resolved toroidal 
plasma current in both the JET and MAST experiment s 6 1 7 1 10 ^ . Figure [I] shows the position 
of plasma beams designed to model the toroidal plasma current. These beams have been 
selected such that their combined volume bounds the plasma volume for all the present 
MAST standard operational scenariosP 

If one were to view each beam in Fig.[T]as an independent model parameter, the toroidal 
current model would correspond to 473 degrees of freedom. This number far exceeds the 
number of diagnostic observations available on any given MAST discharge, which is on the 
order of 150 including MSE measurements across the midplane. While this situation poses 
no fundamental problems in Bayesian inference, there is not enough diagnostic informa- 
tion to preclude unphysical "screening" solutions (i.e. where only the beams closest to the 
observation points are inferred with non-zero currents) from being favoured. One option 
available is to treat the plasma beams as nuisance parameters (model parameters which are 
integrated out in the final inference^) and extract only quantities derived off the toroidal 
currents which still maintain a physically meaningful inference. The other option is to utilise 
an informative prior (i.e. not a uniform distribution over a physically unconstraining range), 
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FIG. 1. Colored rectangles indicate cross-sections of pair- wise independent, axisymmetric current 
elements designed to model the toroidal component of the plasma current. Conducting surface 
and poloidal field coil current beams are indicated by white rectangles having black outlines. 
Stars across the plasma midplane indicate MSE observation points, with stars outside the plasma 
volume corresponding to flux loop positions in the (R, Z) plane. Finally, pickup coil positions and 
corresponding azimuthal orientations are indicated by thick black lines outside the plasma volume. 
One will note that there is a tightly packed configuration of vertically-oriented pickup coils in the 
central solenoid column. 



and this will be discussed in £|V} 
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B. Models for poloidal current and kinetic pressure 



Using R, Z, to denote cylindrical coordinates with the Z axis corresponding to the 
central axis of the tokamak, the poloidal flux can be related to the toroidal current by 

^(T L ; R,Z) = ^ [ Jl(R',Z>)RR> dR > dZ >d<t>\ (5) 

2 J D y/{Z - Z' f + R? + RK - 2RR> cos </>' 

where II denotes the collection of currents from the toroidal current model, D the volume 
filled out by the plasma beams, and Jl{R, Z) being constructed as a 2D step function of the 
toroidal current density associated with the plasma model in £ III A ^ Note that in Eq. ^ 
the semicolon is preceded by model parameter inputs and followed by arguments which are 
given as fixed metadata in the inference. 

Kinetic pressure, p(ip), and poloidal current, f(ip), are both taken to be polynomials of 
the poloidal flux, with p(ip) inferred with units of Pascals and f(ip) taken to represent a 
current, in Amperes, which is related to the toroidal magnetic field via 

/(VO = —BB+. (6) 

In particular, p'(ip) and f{ip) are reconstructed from vectors of polynomial coefficients (de- 
noted p~ c and f c for p'(ijj) an d /(VO respectively) according to 

p'{i l ,p1; R, z) = p'(ip) = p cfi + p Ctl ip + p C:2 ip 2 + Pc,3i> 3 ; (7) 

f(h, Jo, R, Z) = EE /(V> 7 ) + fclty - V 7 ) + fcM - 1%) + fcM - (8) 

where ?/> 7 is the poloidal flux at the last-closed flux surface and f(ip 7 ) is the measured toroidal 
field coil current. p(ip) can be recovered by integrating Eq. ([7j and noting that p(ipf) = 0. 
In these parameterisations, ^ 7 is inferred as a nuisance parameter with an un-constraining 
uniform prior. Thus, the inference is on the polynomial coefficients depicted in Eq. ^ and 
Eq. (JsJ) . The priors on p~ c and f c are uniform distributions whose bounds are chosen so as to 
not constrain the inference on MAST discharges. 



C. Diagnostic Forward Models 



The poloidal flux, ip, f and B are related according tcP^ 



Bz(h;R,z) = ^, (10) 

B^(hJ c ;R,Z) = ^. (11) 

The above and Eq. ^ enable the relation between plasma beam currents and magnetic 
diagnostic predictions to be explicitly expressed and numerically implemented.^ Denoting 
the pickup coil forward and flux loop forward models as Tp and J- ' p respectively, one has 
the relations 

T P {T L - R, Z, 6) = B R cos(£) + B z sin(0), (12) 
F f (IE;R,Z)=tJ>, (13) 

where 6 is the angle between a pickup coil's normal and the mid-plane. 6 7 Predictions for 
the MSE system can also be related to Ip and f c via the forward model 

T (T 7~- R 7 ~A\ — A ° Bz + AiBr + A2B(t> n A'\ 

where A is a constant vector corresponding to the particular MSE viewing geometry* 3 * 4 * 6 * 7 ^ 
Finally, the measured total plasma current provides an additional observation corresponding 
directly to 

y T p{Tp) = J2 1 ^- ( l5 ) 



D. Force-Balance Model 

The GS equation is a manifestation of the force-balance relation requiring that kinetic 
(isotropic) pressure balance out the Lorentz force in axisymmetric magnetic confinement 
devices and can be written a d 3 1 13 1 14 1 

MR^) = 2-kR-p'^) + £^fWf'W- (16) 

Instead of using Jp(R, Z) from Eq. (|5j) as the toroidal current density, one can use Eq. (16) 
and Eqs. (|oTj8|, to indirectly construct another 2D toroidal current density function, call it 



Jr(R, Z), from the model parameters introduce in § III A and £ III B 



jR( I L,Pc,fc,ip 7 ;R,Z) = <{ , (17) 

0, IP < l/jy 



where the conditional is meant to enforce no current flowing outside of the last-closed flux- 
surface. If Eq. ( Jl6| ) accurately represents the true force-balance physics of the plasma, then 
Jl{R, Z) = Jr(R, Z) for all points within the plasma volume. Thus, any of the diagnostic 
forward models in §111 C should yield the same predictions regardless of whether they use 



Jl(R, Z) directly or Jr(R, Z) as their toroidal current density input, for their respective 
magnetic field calculations. This indicates that a second set of forward models can be 
constructed, in addition to those presented in §111 C for which the force-balance condition 



in Eq. (16) is intrinsically held to be true. Heuristically, this new set of forward models is 



constructed via the following sequence of reductions: 

1. Il ip (Ampere's Law); 

2. ip, f c ,p~c | — ^ Jr(R, Z) (Grad-Shafranov); 

3. J R (R, Z)Jc^B (Biot-Savart); 

4. Jr(R, Z),B i-)- magnetics, total plasma current and MSE predictions (J r p, J-'f, J^m, J^tp) 



In both sets of models, is constructed directly from f c according to Eq. (11), and the 
total plasma current associated with Jr(R, Z) is predicted by 

7 TP { J R (R, Z)) := / J R (R, Z) dRdZ, (18) 
Jlcfs 



where LCFS is meant to indicate the cross-sectional area within the LCFS. In §IV[ the 
technique of observation splitting will be presented which enables both sets of forward models 
to be used simultaneously in the BEAST inferences. 

At this point, one may suggest that a parameterisation of ip(R,Z) be inferred as the 



base set of model parameters, instead of using the plasma beam model in £ 111 A However, 
inferring ip(R, Z) directly would render the pickup coils and MSE observations essentially 
unconstraining (i.e. ultimately putting little information into the inference), as these forward 
models are based on derivatives of ip(R, Z) (c.f. Eqs. (|9 10)). In particular, only small, local 
perturbations in ip(R,Z) would be needed to match any diagnostic observation associated 
with a forward model that has the local magnetic field as an input. Thus, the lack of magnetic 
diagnostic observations densely distributed across the entire plasma cross-section, makes it 
impossible to infer the poloidal flux directly with any acceptable accuracy or precision, 
without the use of physically-dominating a priori constraints (e.g. only considering the 
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space of GS solutions). While diagnostic systems are being developed that can provide 
densely-packed magnetic field measurements within the plasma (e.g. the 2D MSE system 
proposed by J. Howard^), such diagnostics are still new and not widely deployed, rendering 
direct inference of ip{R, Z) a future research endeavour to be pursued when more of these 
systems come online. 



IV. OBSERVATION SPLITTING 



In the previous section, two general sets of diagnostic forward models were presented 
which differ only in how they construct the toroidal current density: one does so directly 
from the plasma beam model of §111 A the other uses the plasma beam model to reconstruct 



the poloidal flux, which is subsequently fed into, along with other model parameters, Eq. (17) 
to produce the toroidal current density. Neither set of models sufficiently constrain the 
equilibrium inference to rule out unphysical solutions with a minimally informative prior. 
Given that both these models should be equally valid and non-conflicting (if one accepts 



Eq. (16) and associated assumptions), it is desirable to design an inference scheme which is 
able to utilise both these models as constraining entities before resorting to making more 
informative priors. One way to do this begins by trivially rewriting the likelihood in Bayes' 
formula: 



V(\\JL, 



V{ji\Xa)\IV{ji\\a)V{\) 

(iyj 



By iterating and using the assumed form of Eq. (|3| for the likelihood, the posterior can be 
written as 

V(X\JI, a) = _^]_ J] [Nim - 7 Li (X), 2o?WQh - ^(A), 2(7?)] , (20) 

where C(a) is a constant associated with every likelihood normalisation changing due to the 
rescaling of the variance, with -7 7 l(A) and J-"r(X) representing the forward models that use 
Jl(R,Z) and Jr(R,Z) as the toroidal current density, respectively. That is, J 7 l(A) takes 



Eqs. (12-15) as written, and J-'r(X) takes Eqs. (12-15) using Jr(R, Z) instead of Jl(R,Z). 



For convenience, we call this method of refactoring the likelihood to use two different forward 
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models observation splitting. Indeed, this technique is quite general in that any observation 
can be identically refactored across an arbitrary number of different forward models which 
have their model parameters all coming from a common set of physical quantities. For 
example, a system of n coupled differential equations can be written so that each equation 
corresponds to a particular group of forward models with each model associated with a 
likelihood raised to the 1/nth power. 



A. Evidence Reflecting Force-Balance 



From the statements in S III D and SIV, it is clear that the predictions associated with 



J-"i(A) and J- R (\) should be identical for any given diagnostic, if Eq. (16) truly depicts the 



force-balance physics of the plasma. Subsequently, the overlap between both parts of the 



split likelihood (c.f. the product terms in Eq. ( 19 )) will be maximised for any given diagnostic 



observation in this scenario, which serves to increase the integral of the product of likelihoods 



in Eq. (19). As mentioned earlier, the evidence, V(fi, a) is a constant that insures that the 
posterior probability distribution integrates to unity (c.f the denominator in Eq. ([!])). From 
this and the previous statements, it becomes clear that for a fixed prior and set of diagnostic 
errors, the evidence will be sensitive to the agreement between the predictions of Fiity 



and J~r(\) and can thus be used as an indicator of how accurately Eq. (16) reflects reality, 
with a higher evidence (relative to similar shots using the same diagnostics) providing more 
support for the GS force-balance model for that particular discharge. 

Using evidence as an indicator of how accurately the GS force-balance model depicts 
reality gives BEAST additional capabilities beyond just the inference of model parameters. 
As alluded to in the previous paragraph, these techniques can be used to select candidates 
from sets of discharges to investigate new physics that serve to throw off the predictions 
associated with the GS equation, i.e. the discharge with the lowest evidence being the first 



place to look. Moreover, one can also alter Eq. (16) to include additional effects, such as 
flow, for any given discharge and see which model(s) yield the highest evidence. In this type 
of testing, the relative evidence can be used to indicate which models better fit the data or 
if there is sufficient diagnostic information to lend experimental support for one model over 
another. 
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B. Interpretation of Pressure and the Current Density Discrepancy 



As BEAST does not intrinsically impose a GS constraint (i.e. folding Jl(R,Z) = 
Jr(R, Z) into the plasma model itself), one can interpret the difference between Jl(R, Z) 
and Jr(R, Z) as function which characterises a metric quantifying how far away the phys- 
ical inference is from the space of GS solutions. On the other hand, Jl{R,Z) = Jr(R, Z) 
not being intrinsically enforced undermines a strict physical interpretation of the pressure. 
Indeed, as pressure is not directly constrained by physical measurements, it's only physical 
context in the inference is via the GS relation; but the inferred equilibria itself need not 
necessarily be a GS solution. 

The above dichotomy is the cornerstone behind the design of the BEAST code, which 
gives BEAST a complementary functionality relative to other GS solvers: it can quantify 
how a solution differs from a GS fit at the expense of relegating the pressure to a nuisance 
parameter. To this end, A/j is defined as the current difference associated with each beam 
in the plasma current model: 

AI r .= [ \J L (R,Z)-J R (R,Z)\dRdZ, (21) 

where f2j is the cross sectional domain associated with the ith plasma beam current. Using 
the plasma beam configuration in Fig. [I] and the expression in Eq. (22), one can naturally 



define a 2D step function approximation of the differences in current densities as 

AJ(R, Z) := 1 / \J L (R, Z) - J R (R, Z)\dRdZ. (22) 

It is the inference of AJ(R, Z) (i.e. the difference between inferred structure and the space 
of GS solutions), which differentiates BEAST from GS solvers. 

Finally, It is clear that AJ(R, Z) is tantamount to an additive correction to the GS 
equation itself; and thus, one could seek to infer this quantity directly. However, this would 
require that physical model parameters (i.e. the ones not associated with the correction 
itself) be intrinsically constrained to the space of GS solutions. Given the semi-linear struc- 
ture of the GS equation, computation of any GS solution takes several iterative steps and 
would be required for every sample of or exploratory step taken in the posterior distribution, 
if a GS constraint was intrinsically enforced. As each inference will require up to hundreds of 
thousands of evaluations of the posterior, such a constraint would require a huge amount of 
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computational time/resources. BEAST avoids these difficulties by constraining a set of for- 
ward models according to the GS relation and then inferring A/, as opposed to constraining 
the solutions space of model parameters directly and inferring an additive correction. 



V. PRIOR SELECTION FOR THE TOROIDAL PLASMA CURRENT 
MODEL 



As indicated in £ 111 A minimally informative priors will not prevent unphysical "screen- 
ing" solutions from being inferred. In the past, priors which enforce a degree of smoothen- 
ing across the collection of plasma beams have been used in plasma current tomography to 
mitigate this effect These priors (the conditional auto-regressive prior in Svensson and 
Werner^ and the Gaussian process prior in von Nessi, et. alP) were selected primarily be- 
cause they both are fundamentally Gaussian distributions. These priors, combined with 
linear forward models in the likelihoods, ensured that the associated posterior was also a 
Gaussian distribution, which subsequently could be analysed by fast, analytic computa- 
tional methods. Unfortunately, the non-linear nature of the force-balance forward models 
used in BEAST immediately render the associated equilibrium inference beyond the scope 
of analytic inversion techniques. 

As there is no prior that will reduce BEAST inferences to analytic computations, prior 
selection is made on the second priority of preserving the interpretation of the posterior 



evidence and AJ, defined in Eq. (22). Both of these quantities will obviously be dependent 
on the prior, and this must always be taken into account. However, when comparing an 
inferred evidence between two different inferences, it is only a requirement that the priors 
in both inferences be the same, to draw a meaningful comparison. Thus, it is natural to 
focus on preserving the interpretation of AI as a proxy for quantifying the discrepancy from 
a best-fit GS solution. Naively, one can design a prior which favours configurations where 
A7 = 0: 

V(h, Jc, %, ^y) = A/"(0 - AT, a% 

= A/-(A7,^ 2 T), (23) 

where 1 is the identity matrix, with of a scalar constant to be set that reflects the strength of 



the prior. Indeed, one may think of Eq. (23) embodying a collection of "virtual" likelihoods, 
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having AI as the predictions with the zero vector being the associated observations. The 
term 'virtual' is used here, as this construction is obviously not dependent on any real 
data; and thus, is genuinely a prior construction, even though it superficially resembles a 
likelihood. 



The prior in Eq. (23), ensures that the inference is biased toward model configurations 



satisfying the GS relation embodied in Eq. (16) and subsequent forward models. This is 
exactly what is desired as we wish AI to reflect the distance from the physical inference to 
the best-fit GS solution. Moreover, the strength of this prior is readily adjusted by changing 
the value of a\ and is the adjustment that determines how strongly GS force-balance is 
enforced in the associated equilibrium inferences. 



In practice, the prior in Eq. (23) is very effective at precluding non-physical screening 
solutions from being inferred, even with very high values of o\ (i.e. being less informative). 



Indeed, typical ranges for II for the plasma beam configuration in Fig.[l|range from 0-10kA 
in typical MAST discharge, and setting a* as high as 5kA still gives inferences which are 
physical. 

The general idea of this virtual observation construction of the GS prior was originally 
proposed by Forcffl However, in that work, the number of employed virtual observations 
had to be chosen via an external method. The advantage of the dual set of forward models 
is that AI provides a canonical means by which a single virtual observation is associated 
with every beam in the original toroidal current model. 



VI. COMPUTATION 



In general, the posterior distribution in Eq. (20) for a MAST discharge will be a non- 
Gaussian distribution over approximately 450 model parameter dimensions. Integration of 
and sample generation for such high-dimensional posteriors historically have been compu- 
tationally difficult and/or intractable problems.^ Moreover, traditional methods based on 
approximating the posterior using Gaussian distributions will generally give poor and/or 
unreliable results. Indeed, the only way such approximations can work is if all dimensional 
projections of the posterior are well-approximated by one or more Gaussians. While this 
may be reasonable to assume for a low dimensional problem, in many dimensions it will gen- 
erally be the case that there will be at least some projections which are poorly approximated 
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by normal distributions. To compound the problem, the complexity of the forward models 
often associated with so many model parameters makes deciphering which projections would 
be well-suited to such an approximation a difficult or impossible proposition. 



The other, common alternative to Gaussian approximation to analyse high-dimensional 
posterior distributions is to use algorithms based on the Markov Chain Monte-Carlo 
(MCMC) concept. There is a wealth of literature on the topic of MCMC algorithms that 
describe their associated issues in high- dimensional inference problems Putting these 
issues aside, MCMC methods can only yield samples of the posterior and are unable to pro- 
vide an estimation of the posterior evidence Vijl^a). Indeed, the issue of high-dimensional 
posterior integration is a topic of current research, for which the author knows of only one 
viable algorithm for its computation: the Nested Sampling (NS) method originally proposed 
by SkillingPa 



As the evidence plays an important role in the interpretation of BEAST results (c.f. 



SIVA), the implementation of BEAST centres around the NS algorithm. Beyond being a 
necessity for calculating the evidence, NS also provides a means by which posterior sampling 
can be simulated; this will be elaborated upon in what is to follow and in Appendix |A} 
In general, NS works by transforming the multi-dimensional evidence integral to a one- 
dimensional integral that can be integrated via a statistical quadrature. This quadrature is 
constructed from samples taken from the prior under a likelihood constraint. However, in 
addition to providing a means to calculating the evidence, this collection of samples can also 
be used as a compressed representation of the posterior itself. That is, a posterior probability 
may be canonically associated with each quadrature sample, so that probabilistic selection 
of elements from the entire set, based on this associated posterior probability, reflects direct 
sampling from the posterior. Thus, NS provides a means for both evidence calculation and 
sample generation. 



Beyond these generalities, there are many subtleties and computational obstacles as- 
sociated with the use of the NS algorithm. These points are discussed and addressed in 
Appendix [A] along with some benchmarking results presented for different configuration of 
BEAST run parameters. 
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VII. RESULTS 



To demonstrate the capabilities of BEAST, two high-performance MAST discharges are 
analysed: #22254 at 350ms and #24600 at 265ms. Both of these discharges/times are 
strongly shaped double-null diverter (DnD) plasmas with 3.13MW and 3.35MW of NBI 
heating respectively. However, these discharges differ in that #22254 at 350ms is in a 
H-mode configuration with #24600 at 265ms being in L-mode. Both discharges are well 
diagnosed, having approximately 76 pickup coil, 24 flux loop, and 31 MSE observations 
recorded and available for equilibrium inference. Values for BEAST run parameters used in 
the following results are presented in Tab.|Tj 



Run Parameter 


Value 


sizeSamplePool 


150 


numEvidenceSamples 


36 


numABIFailures 


1000 


numMCMC Jumps 


20 



TABLE I. Typical run-parameters for BEAST used in equilibrium inference for MAST discharges 
#22254 at 350ms and #24600 at 265ms. Definitions for each these run parameters are given in 
Tab. [n] in Appendix [A} 



In addition to the toroidal/poloidal currents, currents for conducting surfaces and poloidal 



field coils were also inferred using the same Biot-Savart forward models detailed in §111 C 
Moreover, additive biases to both pickup coil and flux loop signals were inferred to offset any 
errors in magnetic calibrations. These additional model parameters are treated as nuisance 



parameters in the final inference of plasma currents and kinetic pressure. The data in Tab. Ill 
in Appendix [A] corresponds to inferences run with these additional nuisance parameters. 
An avenue of research being explored is to characterise the impact of pre-computing these 
conducting surface currents and biases using the analytic current tomography introduced 
by Svenssion and Werner^ and subsequently locking these values in a non-analytic BEAST 
inference. The advantage of this would be speeding up BEAST execution times beyond the 
values reported in Tab.|III|in Appendix [Aj 
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A. Toroidal Current Density Inference 



Jl Expectation Jr Expectation AJ Expectation 




FIG. 2. Expectation values of Jl(R,Z), Jr(R,Z) and AJ(R,Z) inferred for MAST discharge 
#22254 at 350ms, as calculated from 1800 samples of the posterior, using pickup coils, flux loops, 
MSE and Rogowski coil data. The variance for the prior has been set to a 2 = .Ol(kA) 2 for this 
inference. The inferred LCFS is indicated in black on each figure with the EFIT calculated LCFS 
overlaid in purple. Flux loop locations are indicated by stars outside the plasma region; position 
and orientation of pickup coils are indicated via heavy bars on the out-board edge of the first wall 
and as a vertically oriented column line along the solenoid; and MSE observation positions are 
indicated by the stars across the mid-plane inside the plasma region, (a) shows Jl(R,Z) current 
density data, with the current densities in (b) reflecting that of Jr(R,Z). Note that the number 
and size of beams representing Jl(R, Z) and Jr(R, Z) are allowed to differ in BEAST inferences, 
(c) shows the magnitude of the current density difference as averaged across each 2D rectangular 
step corresponding to Jl(R,Z). 

Figure |2| shows the expectations of J L (R,Z), J R (R, Z) and AJ(R,Z) for #22254 at 
350ms, with a 2 = .Ol(kA) 2 and reflects typical toroidal current density cross sections inferred 
for DnD MAST charges with MSE data. These expectations have been calculated using 
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sampling statistics generated from 1800 simulated samples of the posterior. The LCFS from 
both BEAST and EFIT are presented in the figure and are in excellent agreement, especially 
around the X-points of the plasma. The slight discrepancy of the LCFS between BEAST 
in EFIT on the outboard edge is due to the fact that the presence of MSE observations 
making BEAST infer an LCFS which is slightly withdrawn into the plasma, when compared 
to EFIT, which was not utilising MSE constraints in this particular discharge. 

The pixelation in Fig. [2] (a) is simply a reflection of the plasma beam model described in 
III A and displayed in Fig.[TJ The pixelation in Fig. [2] (b) reflects the fact that Jr(R, Z) is 



approximated as a collection of densely packed, axisymmetric current beams, which enables 
the same set of algorithms to be used in forward model calculations associated with both 
Jl{R,Z) and Jr(R,Z). However, as Jr(R, Z) itself is a derived structure from II, f c and 
p^, increasing the number of beams used to represent this function (beyond just the number 
of II model parameters) does not increase the degrees of freedom in the inference. Thus, 
a more dense set of beams was selected to represent Jr(R, Z) to increase the accuracy in 
associated forward model calculations. Finally, one will note that the definition of AJ(R, Z) 



in Eq. (22) is such that it will intrinsically reflect the beam configuration associated with 1^. 

Figure [2] (c) shows the regions of the plasma where the highest discrepancies from a GS 
solution are occurring. Given the relatively strong prior constraint, it is unsurprising that 
the AJ(R, Z) values in Figure [2] are relatively small, when compared to both Jl(R, Z) and 
Jr(R, Z). While AJ(R, Z) can give some indication as to physical effects neglected in the 
GS equation, it can also be a reflection of where the plasma is simply more constrained by 
diagnostic observations. Thus, AJ(R, Z) should be taken as a queue of how one may be 
able to resolve additional physics but can not be used to infer such physics directly (at least 
without employing additional constraints). 

Toroidal current density data for discharge #24600 at 265ms is presented in Fig.[3j which 
is analogous to the data for #22254 at 350ms in Fig.[2j Even though the same strength of 
prior was used in both sets of figures, one can see that AJ(R, Z) is, in general, significantly 
larger for discharge #24600 than for #22254, indicating that even with a prior variance of 
ol = .Ol(kA) 2 , the inferred equilibrium can still have a strong discrepancy from a best-fit 
GS solution. Indeed, the differences in the general structure of Jl(R, Z) and Jr(R, Z) can 
be readily seen when comparing (a) and (b) of Fig.|3j with Jr(R, Z) indicating a current 
hole equilibrium. One can also see that the MSE observations are strongly inferring a GS 
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(c) 



FIG. 3. Expectation values of Jl(R,Z), Jr(R, Z) and AJ(R,Z) inferred for MAST discharge 
#24600 at 265ms, calculated from 1800 posterior samples with a prior strength corresponding to 
a"l = .01(A;yl) 2 . These figures are inferred using the same diagnostics and have the same visualisa- 
tion as in Fig. [2} 



discrepancy at the outboard edge of the plasma. 



B. Inference of the Poloidal Flux 



By direct application of Ampere's law, the poloidal flux function can be calculated from 
II- Moreover, this calculation can be done for every sample of II taken from the posterior 
to generate statistics on the poloidal flux function itself. Figure [| shows the poloidal flux 
function expectation and standard deviation as calculated from 1800 samples of the poste- 
rior from #22254 at 350ms. As the magnetic field geometry is very similar for discharge 
#24600 at 280ms, the poloidal flux cross-section is only presented for #22254. In addition to 
calculating statistical moments of the poloidal flux function, uncertainties for individual flux 
surfaces can be visualised by shading according to accumulated line density of a particular 
contour over all samples; such a visualisation is present in Fig.|4] (c). 
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Poloidal Flux Expectation Poloidal Flux Standard Deviation Flux Surface Uncertainties 




(a) (b) (c) 



FIG. 4. Poloidal flux function expectation and standard deviation as calculated from 1800 samples 
of the posterior from shot #22254 at 350ms, with a prior variance set to al = .Ol(kA) 2 . Positions 
of magnetics and MSE observation points are indicated as they were in Fig.[2} In (a) and (b) the 
EFIT LCFS is plotted in white with the inferred LCFS overlaid in black. Uncertainties for the 
■0n = {0, .25, .5, .75} flux surfaces are indicated by the shading in (c) and is overlaid, in red, by the 
MAP value of these contours. 

C. Effects of Altering the Prior Variance 

As Fig.[2]and Fig. [3] indicate, fixing the prior variance by no means ensures that the GS 
discrepancies will be even roughly the same between discharges. This immediately provides 
a means by which to rank different discharges/times in an order reflecting which discharges 
are more likely to embody new physics beyond standard GS force-balance: shots with the 
generally larger AJ(R, Z) for a fixed prior variance are the first places one should look for 
new physics. However, changing the variance of the prior in a single discharge can also 
provide a useful means of determining if the shot is a good candidate for further physics 
investigations. Figure [5] shows how AJ(R, Z) changes as the prior variance is scaled over 
four orders of magnitude. While the overall magnitude of AJ(R, Z) changes significantly 
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FIG. 5. AJ{R, Z) expectations for #22254 at 350ms for various prior variances. Data acquisition 
and visualisation information is the same as that for Fig.[2| 



as one goes from (a) to (b) in Fig.[5j there are persistent characteristics across multiple 
sub-figures. By scanning the variance, one can get clues as to whether a discrepancy is due 
to a diagnostic mis-calibration/conflict or the order of magnitude of a new physics effect. 
Of course, this type of analysis has to be tailored for the shot in question; but Fig.[5]does 
give a clear indication that one can expect to find both varying and persistent discrepancies 
from a best-fit GS solution as the prior variance is altered, which can possibly be used to 
help diagnose new physics. 



D. Profile Inference and Errors 

Statistical moments for q and poloidal current are also routinely inferred in BEAST 
computations. Unlike the kinetic pressure, these quantities are directly constrained by the 
MSE observations and thus retain a standard physical interpretation. Indeed, the profiles 
presented in Fig. [6] remained virtually unaltered when the forward models associated with 
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q vs. Normalised Poloidal Flux Poloidal Current vs. Normalised Poloidal Flux 




FIG. 6. Inferred q and poloidal current profiles as a function of normalised poloidal flux for #22254 
at 350ms. The purple and red lines indicate the expectations for the q and poloidal current profiles, 
respectively. The shading indicates sample density reflecting 1800 samples from the posterior. 



Jr(R, Z) were removed from the BEAST inference and the GS prior replaced by a GP 
smoothening prior (see von Nessi, et. alP about the use of Gaussian Processes in current 
tomography). As the shape of the poloidal current profile is strongly constrained by the MSE 
observations, via Eq. ^ and Eq. (14), with an intrinsic constraint to the toroidal field coil 



current (c.f. Eq. ([8])), it is expected that this profile would have little to no uncertainty in the 
inference. Moreover, as q is closely related to the magnetic pitch angle, MSE measurements 
alone serve to strongly constrain this profile. 

Beyond the above justifications, the models presented in this paper will generally lead 
to inference on equilibrium parameters with very small uncertainties. That is, the inference 
reflects a variational problem with a unique solution. As stated throughout the paper, 
BEAST infers solutions which are the most consistent with all diagnostic observations under 
a priori constraints imposed by the prior and forward models. As there is no direct diagnostic 
constraint on the pressure profile there exists sufficient degrees of freedom so that best fits 
to all diagnostic measurements uniquely exist. Even when decreasing the prior variance to 
al = .001 (kA) 2 significant errors on the pressure are not seen, indicating that the space of 
GS solutions intrinsically contains enough degrees of freedom to contain model parameter 
configurations which accurately predict all diagnostic observations. Thus, it is expected that 
one will only start seeing errors appear, if a direct constraint for the total kinetic pressure 
can be formulated for MAST discharges. This is a current research endeavour. 

Figure [7] presents poloidal current and pressure profiles for discharge #24600 at 280ms. 
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Poloidal Current vs. Midplane Radius 



Kinetic Pressure vs. Midplane Radius 




R(m) R(m) 



FIG. 7. Inferred poloidal current and kinetic pressure profiles for #24600 at 280ms, right after 
the southwest neutral beam disrupts. The dotted lines indicate MAP profiles, with thick lines 
being the expectations. The thin lines represent upper/lower confidence intervals of 95%, with the 
shading indicating sample density as determined from 1800 samples from the posterior. 



These profiles are included to show BEAST's capability of inferring uncertainties even with 
pathological posteriors. Indeed at 270-290ms one of the neutral beams for this discharge 
disrupts. This situation is reflected in the expanded uncertainties in both profiles. One 
will also note that the expectation and MAP profiles are significantly different for both the 
poloidal current and kinetic pressure, indicating a marginalised distribution that is non- 
Gaussian. 



As discussed in §IVB pressure in BEAST inferences is effectively a nuisance parameter; 



but it is useful to verify that the kinetic pressure being inferred is still physically plausible. 
Figure [8] presents the inferred pressure profile for #22254 as a function of normalised flux 
with a comparison to the EFIT-calculated pressure profile. This comparison is not meant 
to be any validation of either BEAST or EFIT beyond indicating that the discrepancies 
shown in Fig.[2] (c) and Fig.[3] (c) correspond to physically plausible pressure, as opposed 
to a vacuum or negative pressure solution to the GS equation. Indeed, without any direct 
diagnostic constraints on the pressure, no rigorous, physical interpretation of the BEAST- 
inferred pressure can be made. 
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Pressure vs. Normalised Poloidal Flux 




FIG. 8. Inferred pressure profile as a function of normalised poloidal flux for #22254 at 350ms. The 
blue line indicates the expectation of the pressure profile as calculated from BEAST. The shading 
indicates relative uncertainty as determined from 1800 samples from the posterior. Overlaid in red 
is the pressure profile calculated from EFIT. The comparison is only meant to indicate that the 
discrepancies from force-balance shown in Fig. [2] (c) correspond to a physical pressure profile. 

E. Scalar Quantities and Evidence Calculations 



With regard to kinetic scalar quantities, again caution must be taken in interpreting 
BEAST results, as such inferences do not intrinsically reflect GS solutions. However, as 
EFIT is known to reliably infer the quantity (3 P + ^, even without MSEP it makes sense 
to verify that the values that BEAST infers for this quantity converge to that of EFIT 
as the prior variance decreases. Indeed, for #22254 at 350ms with a prior variance of 
0-2 = ,ooi{kA) 2 , /3 P + f had an inferred value of 1.0676 ± .0002, compared with the EFIT 
value of 1.0782, which collaborates well with the small average value of AJ(R, Z) seen in 
Fig.[5] (c). For #24600 at 265ms with a prior variance of a 2 = .Ol(kA) 2 , (3 P + ^ was inferred 
to be .6873 ± .0002, with EFIT calculating 1.041. While this discrepancy is large, it very 
consistent with the large AJ(R, Z) presented in Fig.|3] (c). 

The log-evidence was also calculated for both these discharges having the prior variance 
set at al = .Ol(kA) 2 . For #22254 at 350ms log(P(L>)) = 1677.863 ± 1.342; and for #24600 



at 265ms, log(P(.D)) = —1290. 037 ± 1.129. From the discussion in £IVA comparison of 
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these values strongly indicates that #24600 at 265ms is most likely farther away from a GS 
solution when compared with #22254 at 350ms; and indeed this conclusion is consistent 
with the above /3 p + ^ calculations, as well as the results presented in Fig.J^c), Fig.[3](c). 

While results have been presented here for two MAST discharges, it should be noted that 
BEAST inferences have been run on dozens of discharges and run parameter configurations. 
Indeed, most inferences have produced data that strongly resembles what has already been 
presented in this section; and thus, have not been detailed further in this paper. 

VIII. CONCLUSIONS 

The feasibility of using Bayesian analysis in inferring plasma equilibria in a tokamak 
under a force-balance constraint has been demonstrated. In particular, the new technique 
of observation splitting was introduced, which subsequently enabled GS solutions to be 
efficiently characterised in the prior and inferred differences from such solutions to be quan- 
tified. Moreover, an implementation of the nested sampling algorithm has been presented 
which can be utilised as the foundation for high- dimensional, non-analytic inference prob- 
lems. These two points have culminated in a code, named BEAST, which can not only 
infer equilibrium parameters but can also be an indicator of new physics by quantifying 
discrepancies from GS solutions directly and via the evidence associated with the inference 
itself. This code also has the capability of being readily modified to include new physics that 
amends GS force-balance and can subsequently be utilised to validate different force-balance 
models against one another via experimental data. Given these points, it is clear that this 
code exists as an ideal complement to the fast GS solvers already running routine analysis 
on many of today's tokamaks. 

Current research endeavours surround exploiting BEAST as physics exploration tool in 
equilibrium studies on the MAST experiment. In particular, the construction of a direct and 
computationally tractable kinetic pressure constraint is currently being pursued. However, 
work is also being done to re-implement BEAST a parallelised code to run on a wide range 
of computational platforms, including HPCs, with an ultimate goal of making BEAST infer- 
ences an arbitrarily fast computation (depending on the number of available CPUs) suitable 
for routine post-analysis. 
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Appendix A: Evidence Calculation and Posterior Analysis using Nested 
Sampling 

What follows is a review of Skilling's Nested Sampling (NS) algorithm and its imple- 
mentation in BEAST. Specifically, a modified implementation of this algorithm is used to 
calculate the posterior evidence as well as simulate posterior sampling in BEAST inferences. 



Transformation of the Evidence Integral 

The NS algorithm was originally created to calculate evidence integrals for Baysian pos- 
teriors (c.f. the denominator in Eq. (jl|), which were sufficiently complex and/or of high 
dimension such that they were poorly integrated via standard analytic or statistical quadra- 
tures. However, before being able to utilise NS statistics, the evidence integral first needs 
to be transformed. While this transformation is normally presented as part of the nested 
sampling algorithm 8 17 ■, these presentations tend to be heuristic in nature. What follows is 
a concise, rigorous derivation of this transform which is then put into context of the overall 
algorithm used to calculate the evidence. 

In essence, NS involves transforming the multi-dimensional evidence integral into a one- 
dimensional integral, which is amenable to statistical integration via the a statistical tech- 
nique called "nested sampling" . Taking n to represent the dimension of the posterior domain, 
one can write 



V{jt,a) = [ V{-p,a\X)V{\)dX 

JM." 



V(p,a\\) 

V(X) dtdX 



o 



CO 



'0 J {\\V(JX,a\X)>t} 

Next, the following function can be defined 



/ _ V{X)dXdt. (Al) 



£(*):= / V{X)dX 

J {\\V{p,a\X)>tj 

= prior proportion with likelihood greater than t. (A2) 

It is clear that is a decreasing function having range [1,0] and domain [0, oo). Thus, 
£ _1 (t) is also decreasing and defined up to a set of measure zero on the domain [0, 1] (although 
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not necessarily continuous). It is a classical result of real-analysis (c.f. RoyderP^ Ch. 5 Thm. 
3) that enables one to conclude that exists almost everywhere in the domain and is 



Lebesgue integrable. Thus, we can rewrite Eq. (Al) using Eq. (A2), make the substitution 
v = and integrate by parts to find: 

POO 

V(p,w) = / £(t)dt 

JO 

v-£- v (v)dv 

j.1 

v-r\v) + / c\v)dv 

1 JO 

1 

C\v)dv. (A3) 



o 



Unfortunately, the integral representation in Eq. (A3) is poorly approximated by standard 
quadratures as most of the integrand's mass is highly condensed around v — 0. However, it 
will be shown below, that this representation is particularly amenable to integration via a 
statistical quadrature based on the nested sampling principle. 

The closed form expression of is an integral over n — 1 dimensions and offers 



little help in the evaluation of Eq. (A3), as all the problems associated with the original 
integration re-emerge. Moreover, getting an intuitive idea of £ _1 (t) to understand how it 
can be statistically integrated is a subtle endeavour. However, one can start to get this 
understanding by considering a number, say m, of samples extracted from the prior ordered 
from highest to lowest associated likelihood value. Denoting £j as the likelihood value of the 
zth ordered sample with L\ > £ 2 > • • • > An, one then has the following approximation: 

eoc) r 1 (-) . (A4) 

That is to say, if given m prior samples ordered according to decreasing associated likelihood 
value, one expects that approximately i/m of samples coming from the prior (i.e. proportion 
of the prior) will have associated likelihood values greater than Li. It is in this sense that 
likelihood-ordered prior samples can be associated with uniform samples of the abscissa for 



the integrand in Eq. (A3). Indeed, using t to denote a sample from a uniform distribution 
on [0, 1], if one were to take m such samples and re-order them so that t m > t m _i > ■ ■ ■ > £ 1; 
then Eq. ( |A4[ ) can be written as 

« t t => Ct « r 1 (ti) ■ (A5) 
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This suggests that one can integrate using quadratures associated with multiple resamplings 
of the abscissa (i.e. collections of m uniform samples on [0, 1]) to find uncertainties in the 



evidence, which are a direct consequence of the approximation in Eq. (A5). The specifics 



of the numerically integrating the expression in Eq. (A3) using Eq. (A5) will be discussed 
below. 



Generating Posterior Moments via Sample Simulation 

The values of £ _1 (£) used in the evidence integral calculation can also be used to simu- 
late samples from the posterior. More specifically, with the evidence known, it is possible 
to associate a posterior probability with each prior sample used in the evidence quadra- 
ture. Sampling the values used to construct the £ -1 (t) graph according to their posterior 
probabilities then serves as a proxy (i.e. simulation) for direct sampling from the posterior. 
Thus, using these simulated samples, moments of any function of the posterior pdf can be 
calculated using simple sampling statistics. 

To gain a rigorous understanding of the above statements, it is necessary to calculate the 
derivative of £(£), which can be done via the co-area formula (c.f. Federer^ Sec. 3.2) and 
Leibniz rule: 

' {\\VQZ,a\X)>t} 



dH n -i(X)ds 



dt 

_d_f°°f VW 
dtJt J{x\vCfl,<?\\)=s} |VP(/I,a|A)| 

= -/ V ^ > _- dn n -i(X),. (A6) 

J {\\v(ji,a\x)=t} \W(ii,a\X)\ 

where T-L n -\ is the Hausdorff measure of co-dimension 1 relative to the dimension of model 
parameters. Taking arbitrary < Vq < v% < 1, one can now calculate 



£-\v)dv= t __- dHn-i(X)dt, (A7) 



V(X) 



v Jt-Hvi) J{\\v(-p,*\\)=t} \ W((x,a\X)\ 



via the substitution of v — £(£) and using Eq. (A6). On the other hand, the co-area formula 
and the definition of £(£) also indicate 

/_ V(ji,(f\X)V(X)dX = 

-/{A I «- 1 (^)>P(P^|A)>?- 1 (fi)} 

/•r x («o) r <p(J) 

/ */r ( A8 ) 

Ji-Hm) J {x\v(ji,u\x)=t} \vV{fi,(r\\)\ 
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Thus, equating Eq. (A7) and Eq. (A8) leads to 

-l 



r (v)dv 



I'd 



V{jL,a\X)V(X)d\. 



(A9) 



' {x\^{v )>V(Ti,a\X)H- 1 (v 1 )} 

As £ _1 (t) is a decreasing function, any partition of [0, 1] can be mapped to a finite covering 
of IR n via sets defined by { A | ^ 1 (v i ) > V(]I, <r|A) > where Vi and v i+ i are succes- 

sive points of an ordered partition on [0, 1]. This cover will have pairwise non- intersecting 
members outside a set of measure zero, as £ _1 (t) is not necessarily monotone decreasing. 



Thus, Eq. (A9) allows for posterior probabilities to be associated with definite integrals of 



£ -1 (t> ). That is to say, if samples are drawn according to the posterior pdf and associated 
with values of £ _1 and v (done by explicitly calculating the associated prior and likelihood 
probabilities of the sample and using the definition of £(£)), one will find that the number 
of samples within [t>j + i,t>j] (relative to other intervals) is precisely equal to the LHS integral 



in Eq. (A9). Moreover, since the the full evidence integral can be calculated, one has that 

r r» 



v 



V ,Vl\ 



dv, 



(A10) 



as being the proportion of posterior samples able to be associated with the interval [uo,fi] 



for any < Vq < v± < 1. Relating this back to the evidence and Eq. (A5), one can now 
write: 



C x (y)dv 
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(All) 



i=2 



Using Eq. (A10) as the representation of the posterior probability for the integrals in the 



sum in line 2 of Eq. (|A11|) enables one to associate the zth point of the evidence quadrature 



with a posterior probability: 



Vi := " _ " (A12) 

That is to say, the prior sample used to generate the ith point of the £ -1 graph is taken to 
simulate a posterior sample with the posterior probability Vi. 



Equation (A12) allows one to statistically calculate integrals of any function of the pos- 



terior pdf. In particular, the relative entropy, denoted 8, between the posterior and prior 
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can now be computed: 



/ P(F|A, a: 



j^ )lo z\-mw) dX (A13) 

«E^(^)- < A14 > 

The relative entropy is a reflection of how much information is required to move from the 
prior to the posterior and will play a part in the termination criteria used by BEAST in its 
NS implementaiton, described below. 

While this ability to calculate integrals/moments of the posterior is useful, it is often 



more convenient to exploit Eq. (A12) to simulate samples from the posterior directly. Indeed, 



Eq. (A12) enables posterior probabilities to be associated with the set of likelihood-ordered 



prior samples used in Eq. ( A5 ). Thus, one is left with a set of model parameter configurations 
and associated posterior probabilities. This set can be viewed as a compressed version 
of the posterior, where extractions of model parameter configurations according to their 
respective posterior values are correctly interpreted as a simulation of direct sampling from 
the posterior. The caveat is that with every moment calculated from such a simulation, there 
will be associated uncertainties. However, this uncertainty has been consistently observed 



to be negligible in the context of the underlying errors associated with the results in \ VII 



Thus, for the sake of clarity, only averages of these moments are presented in £VII 
Numerical Integration via Nested Sampling 



Equation (A5) already shows how a likelihood-ordered set of prior samples and sets of 



uniform samples on [0,1] can be used to evaluate the integral in Eq. (A3). In practice 
however, most of the mass associated with will usually be strongly condensed around 

t = 0. In particular, an efficient abscissa spacing for evaluating the evidence, as expressed in 



Eq. (A3), can vary over many orders of magnitude, when considering the whole domain of 
integration. Moreover, it's not possible to know the finest spacing needed to evaluation the 
integral to within a certain accuracy a priori; this would require detailed knowledge of the 
posterior before any inference calculations were made. Even if one did know this number, 
it would normally imply the use of so many points in the quadrature that the calculation 
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of the integral over most of the domain would be rendered inefficient to the point where 
the overall integration becomes computationally intractable due to requiring too much time 
and/or memory to calculate. Thus, naive analytic and statistical quadrature methods will 



still poorly approximate the integral in Eq. (A3), due to needing a high and variable 

precision in the abscissa around t — 0. 

Skilling^ developed the method of nested sampling to deal with the specific issue dis- 
cussed above, which can be summarised in the following pseudo-code: 

1. Start with a set of m samples from the prior, with a corresponding set of m uniform, 
abscissa samples on [0, 1]. 

2. Order the prior samples according to their likelihood values, while ordering the abscissa 
samples according to their value. 

3. Extract the prior sample with the lowest likelihood along with the abscissa sample with 
the highest value. Store this as a tuple containing the likelihood and abscissa 
values respectively. Note that i is meant to indicate the index of the extracted tuple. 

4. Replenish the initial prior sample pool by continuing to extract a new prior sample 
until such is found with a likelihood greater than £j. Replenish the pool of abscissa 
samples by drawing a sample from a uniform distribution on [0, ij]. 

5. Using all extracted tuples construct a current approximation of the evidence and rel- 



ative entropy using a trapezoidal quadrature, Eq. (A3), Eq. (A5) and Eq. (A14) 



6. If the number of iterations exceeds 2mS, terminate the algorithm, else return to step 
2. 

Note that without a priori knowledge of the posterior, the choice of termination criteria 
is subjectiveP The above pseudo-code reflects the termination criteria used in BEAST and 
is the one suggested in Sivia & SkillingP. For details regarding the expected accuracy and 
reasoning behind the choice to limit iterations to 2m£, the interested reader is encouraged 
to read section 9.2.2 of Sivia & SkillingEI. 

It is clear that the number m elements will always be preserved in the prior and abscissa 
sample pools, while the likelihood/abscissa constraint will monotonically increase/decrease. 
This leads to a statistical quadrature that naturally accumulates around t = 0, which 
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solves the problem outlined in the previous section. While it is a simple matter to extract a 
uniform sample from [0, tj\ for any tj > 0, extracting prior samples under a growing likelihood 
constraint will become an increasingly difficult computation as the iteration proceeds. In the 
context of BEAST inference, sampling the prior ab initio (i.e. naively sampling the prior and 
then testing its likelihood), will quickly have the algorithm failing an unacceptable amount 
of times before generating a prior sample that can meet the current likelihood constraint. 
Thus, a method was developed that would continue to efficiently generate prior samples 
throughout all iterations of the algorithm. This technique is outlined in the next section. 

Finally, as discussed earlier, multiple abscissa sample sequences are generated to calcu- 
lated the uncertainties of the evidence using sampling statistics. As nested sampling of both 
the prior and abscissa samples can proceed independently, BEAST works by generating one 
sequence of likelihood-ordered prior samples and many sequences of abscissa samples to cal- 
culate samples of the evidence integral. Once the first abscissa sequence is generated, the 
above pseudo-code is rerun using stored likelihood-ordered prior samples with a new abscissa 
being generated as the iteration proceeds. The prior sample sequence is extended in the case 
where more prior samples are needed to meet the termination criteria. In practice rerunning 
the above pseudo-code with stored prior values takes a very small fraction of the time, as 
compared to when the prior sample sequence has to be actively generated. Once the above 
iteration has run for all abscissa sequences, sampling statistics are applied to the calculated 
evidences for each abscissa. 

Priori Sampling and Optimal Seeding 

Unfortunately, in high dimensional inferences, NS is still not sufficient to accurately 
calculate the evidence of the posterior in a time which would be useful to scientists (more 
than weeks for a typical equilibrium inference as outlined in this paper). There are two main 
reasons for this: first, the algorithm is very unlikely to find the regions of high density in 
the posterior; and secondly, generating prior samples becomes increasingly difficult as the 
likelihood constraint increases. Intuitively, it is easy to appreciate that most of the mass of 
the posterior can easily be consolidated in high-localised domain regions and NS may need 
exceedingly long amounts of time to "find" these higher-density regions in a high-dimensional 
domain. Moreover, generating prior samples ab initio will quickly render the vast majority 
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of such samples discarded within just a few refinements of the likelihood constraint, quickly 
crippling the calculation due to growing inefficiency. 

To overcome the above issues, a special implementation of nested sampling was devel- 
oped for BEAST. Generally, this method uses externally calculated local maxima of the 
posterior as potential starting points of Markov Chain Monte Carlo (MCMC) iterations to 
efficiently generate prior samples, regardless of the current likelihood constraint. Given that 
the MCMC iterations are partially seeded by local maxima, we call this method of prior 
sampling optimal seeding. The prior sampling procedure is as follows: 

1. In addition to the pool of initial prior samples, a collection of local maxima of the 
posterior are calculated (using standard optimisation algorithms) and stored, this 
extended set of prior samples and local maxima is denoted S; 

2. ab initio sampling of the prior proceeds in the nested sampling iteration until a fixed 
number (in the results below this is taken to be 1000) of successive samples fail to meet 
the current likelihood constraint for a single attempt to replenish the prior sample 
pool. Subsequent prior samplings will immediately use MCMC iteration instead of 
re-attempting ab initio sampling of the prior. 

3. Once the above threshold is reached, an adaptive MCMC iteration is used to gener- 
ate new prior samples. This MCMC iteration takes a random member from S, as 
its starting point. The jump distribution for the MCMC chain is a Gaussian with 
dimensional standard deviations corresponding to the dimensional lengths of a min- 
imal volume hypercube bounding S. This jump distribution has its associated vari- 
ance scaled further-using acceptance rate data from chains in previous prior sample 
generations-to help ensure an optimal acceptance rate of 23.4%. This rate ensures 
optimal sampling efficiency for the chainpS The chain itself is designed to sample from 
a distribution proportional to the following: 

V(X), V(ii i ,a i \X)>jC i 

(A15) 

0, otherwise 

where £j represents the likelihood constraint for the present iteration. 

4. If the current likelihood constraint exceeds the associated likelihood value of any mem- 
ber of S, this member is removed, i.e. members that were included in S as local maxima 
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of the posterior can be removed under this condition. 

As the local maxima in S will generally have large likelihood values, these members will 
normally be potential starting points for prior MCMC chains after many iterations of nested 
sampling. In practice, optimal seeds are generated through a combination of conjugate 
gradient, Hooke/Jeeves and particle swarm optimisers (see Hassan, et. alP^ for details on 
the particle swarm heuristic). These seeds will typically be discarded after mS iterations, 
i.e. about half-way through the evidence calculation. As the bounding hypercube of the 
samples decreases (or will after some point) as the procedure continues, the jump distribution 
associated with the MCMC chains will become more resolved. This enables efficient sampling 
of the prior regardless of the likelihood constraint. Moreover, since S contains any number 
of local maxima, the algorithm will not waste time searching for regions of high probability 
density. 

This scheme provides a non-local approximation that captures the ambient structure of 
the posterior, as well as the finer structure around the local maxima included in S. Finally, 
as S represent only the starting points of the MCMC iterations, they do not serve to bias 
the sampling of the prior, if a sufficient number of jumps are taken in the MCMC iteration 
itself. In BEAST, MCMC chains are fixed to make twenty jump attempts, if fewer than 
three jumps actually occur, the current chain is discarded and a new MCMC is started from 
a new random starting point taken from S. 



Benchmarks 

Table [IT] contains a list of run parameters which can be set in a BEAST inference, along 
with the suggested minimal values of each associated parameter. Smaller values for each 
parameter lead to overall faster computation, at the cost of consistency or inferred results. 
The values in Tab. [IT] have been empirically observed to produce inferred results which are 
consistent with inferred uncertainties across different BEAST runs on the same MAST dis- 



charge using the same run parameters. However, inferences presented in { VII use higher run 
parameter values (c.f. Tab.|l]) to get better statistics on final inferences. 

The performance of BEAST most strongly depends on the number of prior samples one 
wishes to maintain in nested sampling, with the other run parameters having little to no 
discernible impact on performance or results when set above the corresponding values in 
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Run Parameter 


Variable Name 


Minimal values 


# of Prior Samples 


sizeSamplePool 


20 


# of Evidence Abscissa Samples 


numEvidenceSamples 


12 


# of 06 initio failures before using MCMC 


numAB IFailures 





# of attempted jumps for prior MCMC iterations 


numMCMC Jumps 


12 



TABLE II. Minimal run parameter values suggested for BEAST inference on MAST discharges. 



Tab.pH Specifically, to gain a stable uncertainty for the evidence, at least 12 samples should 
be taken of the evidence abscissa. One need not engage in naive sampling of the posterior 
at all and can simply use MCMC iteration for all prior sampling. However, it was seen that 
allowing for up to 1000 failures in naive prior sampling afforded the MCMC iterations to 
start off with much more resolved jump distributions, which subsequently made the overall 
evidence calculation time more consistent. While going below 20 MCMC jump attempts 
can make the overall computation faster, this was not heavily explored as it was desirable 
to have good assurance that the information contained in the MCMC starting point had 
sufficient time to dissipate (i.e. over the course of approximately 5 jumps) at the optimal 
acceptance rate of 23.4%. 

Skilling^-^ discusses the impact that the initial number of prior samples has on the 
evidence uncertainty, taking 2m£ as the termination criteria. Thus, a detailed discussion 



on this topic here will not be presented here. Table III gives an account of how the number 
of initial prior samples affected the time to calculate the evidence integral and what the 
computed uncertainties where for each of these calculations. Note that typically the evidence 
of the posterior is so large, that results from BEAST are actually output in terms of the 
natural logarithm of the evidence (log-evidence). 

While comparisons of the evidence between different BEAST runs with the same number 
of prior samples all produced results consistent with the reported uncertainties, using differ- 
ent numbers of starting prior samples will, generally showed an impact beyond the stated 
uncertainties. This impact was seen to be up to an order of magnitude above the stated 
uncertainties for the log-evidence. Regardless, this fluctuation is still small compared to the 
average expectation of the log-evidence typically seen in a MAST discharge. More specifi- 
cally, for MAST discharges, the log-evidence of the posterior will be calculated to normally 
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be somewhere between 500 and 1000 with initial prior sample numbers causing fluctuations 
that are on the order of 10. As BEAST comparisons are completed using the same num- 
ber of initial prior samples, relative comparison of the evidence remains meaningful in the 
context discussed in §IV A An overall fluctuation of the evidence across different starting 
numbers of prior samples is not surprising, as the termination criteria directly depends on 
this number. An attempt was made to see if there was a lower bound on the number of prior 
samples that would stabilise the evidence relative to bigger initial sampling pools; but this 
could not be determined, as some discharges had the evidence fluctuating up through an 
initial number of samplings that required more memory than was currently available. The 
search for this lower bound and the exploration of alternative termination criteria remains 
a focus of current research. 



# of Prior Samples 


Time to Compute Evidence (s) 


of quadrature points 


Log-Evidence 2a 


400 


21820.744 


42704 


0.560 


200 


9383.223 


19527 


1.012 


150 


7390.302 


15640 


1.179 


100 


5016.547 


9156 


1.572 


50 


2268.81 


5142 


1.590 



TABLE III. BEAST run times and evidence uncertainties for different numbers of initial prior 
samples using 25 abscissa samples. These statistics corresponds to BEAST analyse of MAST 
discharge #18696 at 290ms. 



Finally, the data in Tab. Ill corresponds to BEAST implemented as a module in the MIN- 
ERVA: a single-threaded, Bayesian java framework developed by SvenssonPl The hardware 
used to run BEAST for these tests was an iMac desktop PC running OS X 10.6.8 with a 
quad-core i7 processor clocking at 2.93GHz per core and having 8GB of memory. It should 
be noted that parallelising the NS algorithm is conceptually a straight-forward matter and 
would greatly reduce these computation times. This is a current research endeavour. 
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